Skip to content

add setting to define filename pattern for part exports#1490

Open
arthurpassos wants to merge 4 commits intoantalya-26.1from
export_filename_pattern_setting
Open

add setting to define filename pattern for part exports#1490
arthurpassos wants to merge 4 commits intoantalya-26.1from
export_filename_pattern_setting

Conversation

@arthurpassos
Copy link
Collaborator

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add setting to define filename pattern for part exports - helps with sharding - port of unmerged and unreviewed PR #1383

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@github-actions
Copy link

github-actions bot commented Mar 9, 2026

Workflow [PR], commit [2ca197c]

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d69971b4f2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

manifest.parquet_parallel_encoding = json->getValue<bool>("parquet_parallel_encoding");
manifest.max_bytes_per_file = json->getValue<size_t>("max_bytes_per_file");
manifest.max_rows_per_file = json->getValue<size_t>("max_rows_per_file");
manifest.filename_pattern = json->getValue<String>("filename_pattern");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve manifest backward compatibility for filename_pattern

Deserialization now requires filename_pattern unconditionally, but metadata written by earlier versions does not include this key. Any node that reads an older exports/.../metadata.json (for example while checking existing exports or canceling an export in StorageReplicatedMergeTree) will throw during fromJsonString, breaking in-flight export management after upgrade. Make this field optional and fall back to the default pattern when absent.

Useful? React with 👍 / 👎.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arthurpassos , what do you think on ^^ ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nobody is using this feature yet, it is ok to introduce backwards incompatible changes like this. We literally have 0 users so far.

- **Type**: `String`
- **Default**: `{part_name}_{checksum}`
- **Description**: Pattern for the filename of the exported merge tree part. The `part_name` and `checksum` are calculated and replaced on the fly. Additional macros are supported.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we duplicate part_export.md content here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is export partition (slightly different feature), and at some point there might be settings that are not supported by export partition and only by export part.

I don't have a good answer tbh.

Macros::MacroExpansionInfo macro_info;
macro_info.table_id = storage_id;
filename = local_context->getMacros()->expand(filename, macro_info);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need special logic from {part_name} and {checksum}?
In other words, why we do not put it inside expand() ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because part_name and checksum are calculated on the fly based on the data part being exported. They are not meant to be extracted from macros, it would not even work tbh

Copy link
Collaborator

@ilejn ilejn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arthurpassos arthurpassos added port-antalya PRs to be ported to all new Antalya releases antalya-26.1 labels Mar 9, 2026
@ilejn
Copy link
Collaborator

ilejn commented Mar 9, 2026

test_export_replicated_mt_partition_to_object_storage/test.py::test_export_partition_from_replicated_database_uses_db_shard_replica_macros test failure could be related to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-26.1 port-antalya PRs to be ported to all new Antalya releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants